深Q学习网络(DQN)是一种成功的方式,将增强学习与深神经网络结合在一起,并导致广泛应用强化学习。当将DQN或其他强化学习算法应用于现实世界问题时,一个具有挑战性的问题是数据收集。因此,如何提高数据效率是强化学习研究中最重要的问题之一。在本文中,我们提出了一个框架,该框架使用深q网络中的最大均值损失(m $^2 $ dqn)。我们没有在训练步骤中抽样一批体验,而是从体验重播中采样了几批,并更新参数,以使这些批次的最大td-Error最小化。所提出的方法可以通过替换损耗函数来与DQN算法的大多数现有技术结合使用。我们在几个健身游戏中使用了最广泛的技术DQN(DDQN)之一来验证该框架的有效性。结果表明,我们的方法会导致学习速度和性能的实质性提高。
translated by 谷歌翻译
语言模型既展示了定量的改进,又展示了新的定性功能,随着规模的增加。尽管它们具有潜在的变革性影响,但这些新能力的特征却很差。为了为未来的研究提供信息,为破坏性的新模型能力做准备,并改善社会有害的效果,至关重要的是,我们必须了解目前和近乎未来的能力和语言模型的局限性。为了应对这一挑战,我们介绍了超越模仿游戏基准(Big Bench)。 Big Bench目前由204个任务组成,由132家机构的442位作者贡献。任务主题是多样的,从语言学,儿童发展,数学,常识性推理,生物学,物理学,社会偏见,软件开发等等。 Big-Bench专注于被认为超出当前语言模型的功能的任务。我们评估了OpenAI的GPT型号,Google内部密集变压器体系结构和大型基础上的开关稀疏变压器的行为,跨越了数百万到数十亿个参数。此外,一个人类专家评估者团队执行了所有任务,以提供强大的基准。研究结果包括:模型性能和校准都随规模改善,但绝对的术语(以及与评估者的性能相比);在模型类中的性能非常相似,尽管带有稀疏性。逐渐和预测的任务通常涉及大量知识或记忆成分,而在临界规模上表现出“突破性”行为的任务通常涉及多个步骤或组成部分或脆性指标;社交偏见通常会随着含糊不清的环境而随着规模而增加,但这可以通过提示来改善。
translated by 谷歌翻译
我们为高分辨率自由呼吸肺MRI介绍了无监督的运动补偿重建方案。我们将时间序列中的图像帧模拟为3D模板图像卷的变形版本。我们假设变形图在高维空间中的光滑歧管上是点。具体地,我们在每次时刻模拟变形图作为基于CNN的发电机的输出,该发电机的输出具有由低维潜航向量驱动的所有时间框架的权重。潜伏向量的时间序列占数据集中的动态,包括呼吸运动和散装运动。模板图像卷,发电机的参数,以及潜在矢量的直接从k-t空间数据以无监督的方式学习。我们的实验结果表明,与最先进的方法相比,改进了重建,特别是在扫描期间散装运动的背景下。
translated by 谷歌翻译
依赖广泛训练数据的深度学习算法正在彻底改变图像恢复从令人虐待的测量。在许多成像应用中,培训数据稀缺,包括超高分辨率成像。引入了用于单次图像恢复的深图(DIP)算法,完全消除了对训练数据的需求。利用该方案的挑战是需要早期停止以最小化CNN参数的过度,以对测量中的噪声最小化。我们介绍了一般性的Stein的无偏见风险估计(GSURE)损失度量,以最大限度地减少过度装备。我们的实验表明,确定的方法最大限度地减少了过度装备的问题,从而提高了古典DIP方案的显着提高的性能。我们还使用CuSt-DIP方法与基于模型的展开架构,其通过直接反转方案提供了改进的性能。
translated by 谷歌翻译
自由呼吸的心脏MRI计划是呼吸持有的Cine MRI协议的竞争替代方案,使适用于儿科和其他不能屏住呼吸的人群。因为来自切片的数据顺序获取,所以心脏/呼吸运动模式可能对每个切片不同;目前的自由呼吸方法对每个切片进行独立恢复。除了不能利用切片间冗余之外,需要手动干预或复杂的后处理方法来对准恢复后的图像进行量化。为了克服这些挑战,我们提出了一种无监督的变分深歧管学习方案,用于多层动态MRI的联合对准和重建。该方案共同了解深网络的参数以及捕获特定对象的K-T空间数据的运动引起的动态变化的每个切片的潜在矢量。变形框架最小化表示中的非唯一性,从而提供改进的对准和重建。
translated by 谷歌翻译
我们介绍了一种无监督的深层歧管学习算法,用于运动补偿动态MRI。我们假设自由呼吸的肺部MRI数据集中的运动场在歧管上。每次即时的运动场被建模为深生成模型的输出,由捕获时间变异性的低维时变潜沿驱动。每次即时的图像都是使用上述运动字段作为图像模板的变形版本的建模。模板,深发电机的参数,以及潜伏向量以无监督的方式从K-T空间数据中学到。歧管运动模型用作规范器,使得运动场和图像的联合估计来自少数径向辐射/帧井井出良好。在运动补偿的高分辨率肺线MRI的背景下证明了算法的效用。
translated by 谷歌翻译
本文回顾了关于压缩视频质量增强质量的第一个NTIRE挑战,重点是拟议的方法和结果。在此挑战中,采用了新的大型不同视频(LDV)数据集。挑战有三个曲目。Track 1和2的目标是增强HEVC在固定QP上压缩的视频,而Track 3旨在增强X265压缩的视频,以固定的位速率压缩。此外,轨道1和3的质量提高了提高保真度(PSNR)的目标,以及提高感知质量的2个目标。这三个曲目完全吸引了482个注册。在测试阶段,分别提交了12个团队,8支球队和11支球队,分别提交了轨道1、2和3的最终结果。拟议的方法和解决方案衡量视频质量增强的最先进。挑战的首页:https://github.com/renyang-home/ntire21_venh
translated by 谷歌翻译
Graph Neural Networks (GNNs) have shown satisfying performance on various graph learning tasks. To achieve better fitting capability, most GNNs are with a large number of parameters, which makes these GNNs computationally expensive. Therefore, it is difficult to deploy them onto edge devices with scarce computational resources, e.g., mobile phones and wearable smart devices. Knowledge Distillation (KD) is a common solution to compress GNNs, where a light-weighted model (i.e., the student model) is encouraged to mimic the behavior of a computationally expensive GNN (i.e., the teacher GNN model). Nevertheless, most existing GNN-based KD methods lack fairness consideration. As a consequence, the student model usually inherits and even exaggerates the bias from the teacher GNN. To handle such a problem, we take initial steps towards fair knowledge distillation for GNNs. Specifically, we first formulate a novel problem of fair knowledge distillation for GNN-based teacher-student frameworks. Then we propose a principled framework named RELIANT to mitigate the bias exhibited by the student model. Notably, the design of RELIANT is decoupled from any specific teacher and student model structures, and thus can be easily adapted to various GNN-based KD frameworks. We perform extensive experiments on multiple real-world datasets, which corroborates that RELIANT achieves less biased GNN knowledge distillation while maintaining high prediction utility.
translated by 谷歌翻译
In robust Markov decision processes (MDPs), the uncertainty in the transition kernel is addressed by finding a policy that optimizes the worst-case performance over an uncertainty set of MDPs. While much of the literature has focused on discounted MDPs, robust average-reward MDPs remain largely unexplored. In this paper, we focus on robust average-reward MDPs, where the goal is to find a policy that optimizes the worst-case average reward over an uncertainty set. We first take an approach that approximates average-reward MDPs using discounted MDPs. We prove that the robust discounted value function converges to the robust average-reward as the discount factor $\gamma$ goes to $1$, and moreover, when $\gamma$ is large, any optimal policy of the robust discounted MDP is also an optimal policy of the robust average-reward. We further design a robust dynamic programming approach, and theoretically characterize its convergence to the optimum. Then, we investigate robust average-reward MDPs directly without using discounted MDPs as an intermediate step. We derive the robust Bellman equation for robust average-reward MDPs, prove that the optimal policy can be derived from its solution, and further design a robust relative value iteration algorithm that provably finds its solution, or equivalently, the optimal robust policy.
translated by 谷歌翻译
Human parsing aims to partition humans in image or video into multiple pixel-level semantic parts. In the last decade, it has gained significantly increased interest in the computer vision community and has been utilized in a broad range of practical applications, from security monitoring, to social media, to visual special effects, just to name a few. Although deep learning-based human parsing solutions have made remarkable achievements, many important concepts, existing challenges, and potential research directions are still confusing. In this survey, we comprehensively review three core sub-tasks: single human parsing, multiple human parsing, and video human parsing, by introducing their respective task settings, background concepts, relevant problems and applications, representative literature, and datasets. We also present quantitative performance comparisons of the reviewed methods on benchmark datasets. Additionally, to promote sustainable development of the community, we put forward a transformer-based human parsing framework, providing a high-performance baseline for follow-up research through universal, concise, and extensible solutions. Finally, we point out a set of under-investigated open issues in this field and suggest new directions for future study. We also provide a regularly updated project page, to continuously track recent developments in this fast-advancing field: https://github.com/soeaver/awesome-human-parsing.
translated by 谷歌翻译